Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification

نویسندگان

  • Jie Hu
  • Shaobo Li
  • Yong Yao
  • Liya Yu
  • Guanci Yang
  • Jianjun Hu
چکیده

Many text mining tasks such as text retrieval, text summarization, and text comparisons depend on the extraction of representative keywords from the main text. Most existing keyword extraction algorithms are based on discrete bag-of-words type of word representation of the text. In this paper, we propose a patent keyword extraction algorithm (PKEA) based on the distributed Skip-gram model for patent classification. We also develop a set of quantitative performance measures for keyword extraction evaluation based on information gain and cross-validation, based on Support Vector Machine (SVM) classification, which are valuable when human-annotated keywords are not available. We used a standard benchmark dataset and a homemade patent dataset to evaluate the performance of PKEA. Our patent dataset includes 2500 patents from five distinct technological fields related to autonomous cars (GPS systems, lidar systems, object recognition systems, radar systems, and vehicle control systems). We compared our method with Frequency, Term Frequency-Inverse Document Frequency (TF-IDF), TextRank and Rapid Automatic Keyword Extraction (RAKE). The experimental results show that our proposed algorithm provides a promising way to extract keywords from patent texts for patent classification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of BKCa channel openers by molecular field alignment and patent data-driven analysis

In this work, we present the first comprehensive molecular field analysis of patent structures on how the chemical structure of drugs impacts the biological binding. This task was formulated as searching for drug structures to reveal shared effects of substitutions across a common scaffold and the chemical features that may be responsible. We used the SureChEMBL patent database, which prov...

متن کامل

An Efficient Patent Keyword Extractor As Translation Resource

The paper addresses the issue of resource reuse in patent translation. It presents an efficient patent keyword/phrase extraction tool and illustrates how the tool can be used in patent translation by both human experts and MT developers. The keyword extraction is based on a new hybrid methodology providing for intelligent output and computationally attractive properties. The tool is composed of...

متن کامل

Mining Patents Using Molecular Similarity Search

Text analytics is becoming an increasingly important tool used in biomedical research. While advances continue to be made in the core algorithms for entity identification and relation extraction, a need for practical applications of these technologies arises. We developed a system that allows users to explore the US Patent corpus using molecular information. The core of our system contains thre...

متن کامل

Extraction of Keywords of Novelties From Patent Claims

There are growing needs for patent analysis using Natural Language Processing (NLP)-based approaches. Although NLP-based approaches can extract various information from patents, there are very few approaches proposed to extract those parts what inventors regard as novel or having an inventive step compared to all existing works ever. To extract such parts is difficult even for human annotators ...

متن کامل

PATExpert: Semantic Processing of Patent Documentation

PATExpert is a recently started “Specific Targeted Research Project” funded by the EC in FP 6, IST priority. PATExpert’s goal is to change the paradigm currently followed for patent processing from textual to semantic. We are about to develop a semantic multimedia content representation based on Semantic Web technologies for selected technology areas and to investigate some central topics from ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Entropy

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2018